A Conditional Expectation Approach to Model Selection and Active Learning under Covariate Shift
نویسنده
چکیده
In the previous chapter, Kanamori and Shimodaira provided generalization error estimators which can be used for model selection and active learning. The accuracy of these estimators is theoretically guaranteed in terms of the expectation over realizations of training input-output samples. In practice, we are only given a single realization of training samples. Therefore, ideally, we want to have an estimator of the generalization error that is accurate in each single trial. However, we may not be able to avoid taking the expectation over the training output noise since it is not generally possible to know the realized value of noise. On the other hand, the location of the training input points is accessible by nature. Motivated by this fact, we propose to estimate the generalization error without taking the expectation over training input points. That is, we evaluate the unbiasedness of the generalization error in terms of the conditional expectation of training output noise given training input points. 1 Conditional Expectation Analysis of Generalization Error In order to illustrate a possible advantage of the conditional expectation approach, let us consider a simple model selection scenario where we have only one training sample A Conditional Expectation Approach to Model Selection and Active Learning 2 (a) Generalization error for model M1 (b) Generalization error for model M2 Figure 1: Schematic illustrations of the conditional expectation and full expectation of the generalization error. (x, y) (see Figure 1). The solid curves in Figure 1(a) depict GM1(y|x), the generalization error for a model M1 as a function of the (noisy) training output value y given a training input point x. The three solid curves correspond to the cases where the realization of the training input point x is x′, x′′, and x′′′, respectively. The value of the generalization error for the model M1 in the full expectation approach is depicted by the dash-dotted line, where the expectation is taken over both the training input point x and the training output value y (this corresponds to the mean of the three solid curves). The values of the generalization error in the conditional expectation approach are depicted by the dotted lines, where the expectation is taken only over the training output value y, conditioned on x = x′, x′′, x′′′, respectively (this corresponds to the mean value of each solid curve). The graph in Figure 1(b) depicts the generalization errors for a model M2 in the same manner. In the full expectation framework, the model M1 is judged to be better than M2 regardless of the realization of the training input point since the dash-dotted line in Figure 1(a) is lower than that in Figure 1(b). However, M2 is actually better than M1 if x′′ or x′′′ is realized as x. In the conditional expectation framework, the goodness of the model is adaptively evaluated depending on the realization of the training input point x. This illustrates that the conditional expectation framework can indeed provide a better model choice than the full expectation framework. A Conditional Expectation Approach to Model Selection and Active Learning 3 Figure 2: Regression problem of learning f(x) from {(xi, yi)}i=1. {εi}i=1 are i.i.d. noise with mean zero and variance σ, and f̂(x) is a learned function. In this chapter, we address the problems of model selection and active learning in the conditional expectation framework. The rest of this chapter is organized as follows. After the problem formulation in Section 2, we introduce a model selection criterion (Section 3) and an active learning criterion (Section 4) in the conditional expectation framework and show that they are more advantageous than the full expectation methods in the context of approximate linear regression. Then we discuss how model selection and active learning can be combined in Section 5. Finally we give concluding remarks and future prospects in Section 6. 2 Linear Regression under Covariate Shift In this section, we formulate a linear regression problem with covariate shift. 2.1 Statistical Formulation of Linear Regression Let us consider a regression problem of estimating an unknown input-output dependency from training samples. Let {(xi, yi)}i=1 be the training samples, where xi ∈ X ⊂ R is an i.i.d. training input point following a probability distribution Ptr(x) and yi ∈ Y ⊂ R is a corresponding training output value following a conditional probability distribution P (y|x = xi). We denote the conditional mean of P (y|x) by f(x) and assume that the conditional variance is σ, which is independent of x. Then P (y|x) may be regarded as consisting of the true output f(x) and the noise ε with mean 0 and variance σ (see Figure 2). Let us employ a linear regression model for learning f(x).
منابع مشابه
Addressing Covariate Shift in Active Learning with Adversarial Prediction
Active learning approaches used in practice are generally optimistic about their certainty with respect to data shift between labeled and unlabeled data. They assume that unknown datapoint labels follow the inductive biases of the active learner. As a result, the most useful datapoint labels— ones that refute current inductive biases—are rarely solicited. We propose an adversarial approach to a...
متن کاملModel Selection Under Covariate Shift
A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, or active learning scenarios. The violation of this assumption— known as the covariate shift—causes a heavy bias in standard generalization error estimation schemes such as cross-validati...
متن کاملSelection Bias Correction in Supervised Learning with Importance Weight. (L'apprentissage des modèles graphiques probabilistes et la correction de biais sélection)
In the theory of supervised learning, the identical assumption, i.e. the training and the test samples are drawn from the same probability distribution, plays a crucial role. Unfortunately, this essential assumption is often violated in the presence of selection bias. Under such condition, the standard supervised learning frameworks may suffer a significant bias. In this thesis, we use the impo...
متن کاملCovariate Shift Adaptation by Importance Weighted Cross Validation
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distr...
متن کاملKernel Robust Bias-Aware Prediction under Covariate Shift
Under covariate shift, training (source) data and testing (target) data differ in input space distribution, but share the same conditional label distribution. This poses a challenging machine learning task. Robust Bias-Aware (RBA) prediction provides the conditional label distribution that is robust to the worstcase logarithmic loss for the target distribution while matching feature expectation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008